Example: Multi-metric runs¶
This example shows how to evaluate an atom's pipeline on multiple metrics.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
InĀ [1]:
Copied!
# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor
InĀ [2]:
Copied!
# Load data
X = pd.read_csv("docs_source/examples/datasets/abalone.csv")
# Let's have a look
X.head()
# Load data
X = pd.read_csv("docs_source/examples/datasets/abalone.csv")
# Let's have a look
X.head()
Out[2]:
| Sex | Length | Diameter | Height | Whole weight | Shucked weight | Viscera weight | Shell weight | Rings | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | M | 0.455 | 0.365 | 0.095 | 0.5140 | 0.2245 | 0.1010 | 0.150 | 15 |
| 1 | M | 0.350 | 0.265 | 0.090 | 0.2255 | 0.0995 | 0.0485 | 0.070 | 7 |
| 2 | F | 0.530 | 0.420 | 0.135 | 0.6770 | 0.2565 | 0.1415 | 0.210 | 9 |
| 3 | M | 0.440 | 0.365 | 0.125 | 0.5160 | 0.2155 | 0.1140 | 0.155 | 10 |
| 4 | I | 0.330 | 0.255 | 0.080 | 0.2050 | 0.0895 | 0.0395 | 0.055 | 7 |
Run the pipeline¶
InĀ [3]:
Copied!
atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)
atom = ATOMRegressor(X, n_jobs=1, verbose=2, random_state=1)
<< ================== ATOM ================== >> Configuration ==================== >> Algorithm task: Regression. Dataset stats ==================== >> Shape: (4177, 9) Train set size: 3342 Test set size: 835 ------------------------------------- Memory: 300.88 kB Scaled: False Categorical features: 1 (12.5%) Outlier values: 189 (0.6%)
InĀ [4]:
Copied!
atom.encode()
atom.encode()
Fitting Encoder... Encoding categorical columns... --> OneHot-encoding feature Sex. Contains 3 classes.
InĀ [5]:
Copied!
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "rmse"),
n_trials=10,
n_bootstrap=6,
)
# For every step of the BO, both metrics are calculated,
# but only the first is used for optimization!
atom.run(
models=["lsvm", "hGBM"],
metric=("r2", "rmse"),
n_trials=10,
n_bootstrap=6,
)
Training ========================= >> Models: lSVM, hGBM Metric: r2, rmse Running hyperparameter tuning for LinearSVM... | trial | loss | C | dual | r2 | best_r2 | rmse | best_rmse | time_trial | time_ht | state | | ----- | ----------------------- | ------- | ------- | ------- | ------- | ------- | --------- | ---------- | ------- | -------- | | 0 | squared_epsilon_insen.. | 0.001 | True | 0.2887 | 0.2887 | -2.6528 | -2.6528 | 0.043s | 0.043s | COMPLETE | | 1 | squared_epsilon_insen.. | 0.0534 | False | 0.3862 | 0.3862 | -2.5926 | -2.5926 | 0.043s | 0.086s | COMPLETE | | 2 | squared_epsilon_insen.. | 0.0105 | True | 0.433 | 0.433 | -2.4084 | -2.4084 | 0.054s | 0.140s | COMPLETE | | 3 | epsilon_insensitive | 0.6215 | True | 0.4022 | 0.433 | -2.5251 | -2.4084 | 0.045s | 0.185s | COMPLETE | | 4 | squared_epsilon_insen.. | 0.0369 | False | 0.4057 | 0.433 | -2.5477 | -2.4084 | 0.040s | 0.225s | COMPLETE | | 5 | epsilon_insensitive | 0.0016 | True | -1.5344 | 0.433 | -5.0102 | -2.4084 | 0.035s | 0.260s | COMPLETE | | 6 | squared_epsilon_insen.. | 61.5811 | False | 0.4354 | 0.4354 | -2.3845 | -2.3845 | 0.034s | 0.294s | COMPLETE | | 7 | squared_epsilon_insen.. | 14.898 | False | 0.4925 | 0.4925 | -2.2628 | -2.2628 | 0.035s | 0.329s | COMPLETE | | 8 | epsilon_insensitive | 0.0252 | True | 0.3695 | 0.4925 | -2.6178 | -2.2628 | 0.035s | 0.364s | COMPLETE | | 9 | squared_epsilon_insen.. | 0.0294 | True | 0.4767 | 0.4925 | -2.3896 | -2.2628 | 0.044s | 0.408s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 7 Best parameters: --> loss: squared_epsilon_insensitive --> C: 14.898 --> dual: False Best evaluation --> r2: 0.4925 rmse: -2.2628 Time elapsed: 0.408s Fit --------------------------------------------- Train evaluation --> r2: 0.4592 rmse: -2.3795 Test evaluation --> r2: 0.4584 rmse: -2.3369 Time elapsed: 0.089s Bootstrap --------------------------------------- Evaluation --> r2: 0.4577 ± 0.002 rmse: -2.3384 ± 0.0043 Time elapsed: 0.094s ------------------------------------------------- Time: 0.592s Running hyperparameter tuning for HistGradientBoosting... | trial | loss | quantile | learning_rate | max_iter | max_leaf_nodes | max_depth | min_samples_leaf | l2_regularization | r2 | best_r2 | rmse | best_rmse | time_trial | time_ht | state | | ----- | --------- | -------- | ------------- | -------- | -------------- | --------- | ---------------- | ----------------- | ------- | ------- | ------- | --------- | ---------- | ------- | -------- | | 0 | absolut.. | 0.1 | 0.0236 | 180 | 26 | 12 | 11 | 0.0 | 0.5373 | 0.5373 | -2.1398 | -2.1398 | 0.968s | 0.968s | COMPLETE | | 1 | gamma | 0.5 | 0.242 | 160 | 38 | 3 | 20 | 0.0 | 0.574 | 0.574 | -2.1598 | -2.1398 | 0.160s | 1.128s | COMPLETE | | 2 | quantile | 0.4 | 0.2448 | 210 | 12 | 3 | 25 | 0.3 | 0.4714 | 0.574 | -2.3253 | -2.1398 | 0.422s | 1.550s | COMPLETE | | 3 | quantile | 0.6 | 0.017 | 480 | 28 | 16 | 13 | 0.1 | 0.5712 | 0.574 | -2.1385 | -2.1385 | 3.405s | 4.956s | COMPLETE | | 4 | squared.. | 1.0 | 0.2649 | 70 | 10 | 10 | 28 | 0.8 | 0.5561 | 0.574 | -2.2019 | -2.1385 | 0.148s | 5.104s | COMPLETE | | 5 | squared.. | 0.1 | 0.0283 | 360 | 32 | 9 | 11 | 0.5 | 0.5464 | 0.574 | -2.1197 | -2.1197 | 1.248s | 6.352s | COMPLETE | | 6 | quantile | 0.4 | 0.1264 | 380 | 37 | 12 | 29 | 1.0 | 0.4416 | 0.574 | -2.3713 | -2.1197 | 3.002s | 9.354s | COMPLETE | | 7 | gamma | 0.6 | 0.678 | 330 | 25 | 6 | 12 | 0.8 | 0.4299 | 0.574 | -2.3984 | -2.1197 | 0.739s | 10.092s | COMPLETE | | 8 | absolut.. | 0.9 | 0.0831 | 280 | 42 | 16 | 10 | 1.0 | 0.5242 | 0.574 | -2.2742 | -2.1197 | 2.002s | 12.094s | COMPLETE | | 9 | absolut.. | 0.6 | 0.0373 | 300 | 40 | 13 | 17 | 0.8 | 0.5685 | 0.574 | -2.17 | -2.1197 | 1.859s | 13.953s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 5 Best parameters: --> loss: squared_error --> quantile: 0.1 --> learning_rate: 0.0283 --> max_iter: 360 --> max_leaf_nodes: 32 --> max_depth: 9 --> min_samples_leaf: 11 --> l2_regularization: 0.5 Best evaluation --> r2: 0.5464 rmse: -2.1197 Time elapsed: 13.953s Fit --------------------------------------------- Train evaluation --> r2: 0.7959 rmse: -1.4619 Test evaluation --> r2: 0.5479 rmse: -2.1351 Time elapsed: 1.470s Bootstrap --------------------------------------- Evaluation --> r2: 0.5259 ± 0.0154 rmse: -2.1861 ± 0.0352 Time elapsed: 7.930s ------------------------------------------------- Time: 23.353s Final results ==================== >> Total time: 25.299s ------------------------------------- LinearSVM --> r2: 0.4577 ± 0.002 rmse: -2.3384 ± 0.0043 HistGradientBoosting --> r2: 0.5259 ± 0.0154 rmse: -2.1861 ± 0.0352 ~ !
InĀ [6]:
Copied!
# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()
# Check the robustness of the pipeline using cross-validation
atom.winner.cross_validate()
Applying cross-validation...
Out[6]:
| train_r2 | test_r2 | train_rmse | test_rmse | time (s) | |
|---|---|---|---|---|---|
| 0 | 0.796038 | 0.541990 | -1.453147 | -2.196943 | 1.392266 |
| 1 | 0.794954 | 0.540424 | -1.457709 | -2.196179 | 1.436932 |
| 2 | 0.790722 | 0.505922 | -1.492522 | -2.153457 | 1.444314 |
| 3 | 0.785317 | 0.580703 | -1.474827 | -2.189902 | 1.432303 |
| 4 | 0.795872 | 0.547917 | -1.461929 | -2.135072 | 1.747591 |
| mean | 0.792581 | 0.543391 | -1.468027 | -2.174311 | 1.490681 |
| std | 0.004114 | 0.023780 | 0.014222 | 0.025330 | 0.129719 |
Analyze the results¶
InĀ [8]:
Copied!
# The columns in the results dataframe contain one for each metric
atom.results[["r2_ht", "r2_train", "r2_test", "rmse_ht", "rmse_train", "rmse_test"]]
# The columns in the results dataframe contain one for each metric
atom.results[["r2_ht", "r2_train", "r2_test", "rmse_ht", "rmse_train", "rmse_test"]]
Out[8]:
| r2_ht | r2_train | r2_test | rmse_ht | rmse_train | rmse_test | |
|---|---|---|---|---|---|---|
| lSVM | 0.492530 | 0.4583 | 0.4552 | -2.262754 | -2.3815 | -2.3439 |
| hGBM | 0.546368 | 0.7183 | 0.4971 | -2.119672 | -1.7173 | -2.2518 |
InĀ [9]:
Copied!
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_trials(metric="r2", title="Hyperparameter tuning performance for R2")
atom.plot_trials(metric="rmse", title="Hyperparameter tuning performance for RMSE")
# Some plots allow us to choose the metric we want to show
with atom.canvas():
atom.plot_trials(metric="r2", title="Hyperparameter tuning performance for R2")
atom.plot_trials(metric="rmse", title="Hyperparameter tuning performance for RMSE")
InĀ [10]:
Copied!
atom.plot_results(metric="r2")
atom.plot_results(metric="r2")